Alerts
Centurion uses Opsgenie alerts that are generated by both the centurionV2-workers and centruiV2-nodejs Nodejs instances. Alerts provide instant awareness to conditions that require immediate attention. Centurion relies heavily on the mobile app notifications sent by Opsgenie.
Saved searches
There is a short list of saved searches to assist admins to navigate open alerts. They appear in the sidebar of the Alerts list page. Admins are encouraged to create their own saved searches and share them with others if they are useful to the monitoring team.
- open (dev)
- open (prod)
- snoozed (prod)
Update counter
Alerts are generated by workers. A worker runs on a regular timed cycle. With each cycle of the worker the alert is updated if the condition still exists. A counter located at the beginning of the alert's title is incremented each time the alert is updated.
Current alerts
An alert is current until the next run cycle of its worker marks it as closable.
🔴 THIS ALERT CAN BE CLOSED 🔴
This message indicates that the worker has determined the conditions of the alert, shown below the message, no longer exists. As such the alert can be closed. However, if the alert is not closed before the worker's next run cycle and the conditions returns, the message is removed and the alert's description is updated. The red icon 🔴 will also appear on the alert list view page for each alert that is closable.
Resolving alerts
If an alert is closed or deleted without resolving the issue, a new alert will be generated shortly depending on the frequency of execution of the worker behind the alert type. Therefore accidental deletion or closing of alerts is not entirely critical.
Resolving issues is part of the training documentation in the tasks repo.
Snooze alerts
Centurion updates any alert that is still open with each run cycle of the worker that created it. Additionally there is the alertsWorker that runs every 90 seconds to determine if the alert can be closed. This can put stress on the Opsgenie rate limit accountability imposed by Opsgenie.
In light of the @api3/contracts@10.0.0 upgrade, many alerts were created for fallback wallets. The alerts were not processed immediately and were still open as of the creation for each alert. These alerts are updated constantly with each run cycle of the worker that created them. As a result Opsgenie has started to impose rate limiting on Centurion. Centurion is the largest consumer of Opsgenie calls at API3.
To reduce the number of API calls for these idle calls, Centurion can leverage the "Snooze mode" of an Opsgenie alert. In doing so the frequent alert updates would be halted as long as the alert is tagged as snoozed. Additionally the alertsWorker which determines that the alert can b closed will ignore the alert as well.
Some alerts like those from the signedDataWorker have rolling data that can change from on run cycle to another. They are not good candidates for snooze.
Others such as walletsWorker, pricesWorker, and historyWorker focus on a single entity that never changes except for some supporting numbers. These can be good candidates to tag as snoozed. Example: fallback wallets that are going to be updated at a future date.
Snoozing an alert requires a good reason and needs to be closed at some point or the snooze removed. Snoozed alerts will still show up in the list of open alerts for visibility.
Alert Templates
All Centurion workers generate one or more alerts. Each alert is associated with a template. Alerts describe a critical condition and provide instructions for a remedy. The instructions are not repeated in these docs.
Each templates provides historical data for an alert until the alert is closed. The historical data is generated anew with each run cycle of the worker. The data is stored in the Centurion database and is presented as a link in the alert's description.
chainsWorker
Syncs chains from the @api3/chains package with the CHAINS database table.
There is one alert template for this worker.
module.exports = {
createChainAlertObj: createChainAlertObj,
};
createChainAlertObj
Generates an alert when a chain is found in the database and not in api3/chains and vice-versa.
feedsWorker
Syncs feeds between the @api3/dapi-management package with the FEEDS database table.
module.exports = {
createFeedAlertObj: createFeedAlertObj,
};
createFeedAlertObj
Generates an alert when a feed is found in the database and not in api3/dapi-management and vice-versa.
pricesWorker
Gathers on-chain data (values, timestamps, and parameters) from each chain's AirseekerRegistry contract. The data is placed into the PRICES Postgres database table.
There two alert templates for this worker.
module.exports = {
TS_CYCLE: TS_CYCLE,
createFeedsMismatchAlertObj: createFeedsMismatchAlertObj,
createDataUndefinedAlertObj: createDataUndefinedAlertObj,
};
createFeedsMismatchAlertObj
Generates an alert when an active feed (by its dAPI name) exists on-chain but not in the database. This indicates a mismatch between the @api3/dapi-management package and AirseekerRegistry.
createDataUndefinedAlertObj
When the prices worker runs it calls ./airseeker.js. The return dataset is an array of feed data, or an empty array, unless airseeker.js failed, then undefined is returned. The failure is almost likely to be all rpc provider URLs failed. However other conditions may be responsible. airseeker.js will have updated Loki with the cause.
This template raises an alert for each chain that returned an undefined dataset (undefined array). The possible causes are listed in the alert's instructions.
signedDataWorker
Validates the data from API providers. Checks for deviation from the median value and stale timestamps are performed. Note that this worker has a config.json file that is used to ignore some feeds during its execution.
There are four alert templates for this worker.
module.exports = {
createDeviationAlertObj: createDeviationAlertObj,
createTimestampAlertObj: createTimestampAlertObj,
createProvideUrlsAlertObj: createProvideUrlsAlertObj,
createExtractDataAlertObj: createExtractDataAlertObj,
createCriticalDeviationAlertObj: createCriticalDeviationAlertObj,
};
createDeviationAlertObj
All provider data, for each feed, is checked for deviation from the median value.
createTimestampAlertObj
The timestamp of provider data is checked to insure that it has been updated in a reasonable time span.
createProvideUrlsAlertObj
Creates a list of API provider URLs that failed to respond.
createExtractDataAlertObj
Creates a list of errors, from each provider, when the timestamp or value of a feed failed to decode after the data was successfully retrieved.
createCriticalDeviationAlertObj
Sends a P1 alert for any feed that has 30% or more of its API providers with a price that is out of deviation in comparison to its peer providers.
symbolsWorker
Updates the database table SYMBOLS with the known feed names (API3/USD) from all validators. These symbols are only for diagnostic reference in the Centurion UI.
The worker adds and removes validators to each feed based on each validator's ability to return a feed value with each run cycle.
- api-provider-signed-data
- nodary-signed-api (aggregates from api-provider-signed-data)
- redstone (independent oracle)
- pyth (independent oracle)
This worker does not generate alerts.
ttlWorker
The Centurion database is not a long term historical database. Many tables are flushed of there data after a specified period of time.
This worker does not generate alerts.
walletsWorker
module.exports = {
createWalletAlertObj,
createFallbackWalletAlertObj,
createWalletDataRetrievalErrorObj,
createWalletDataProcessingErrorObj,
};
createWalletAlertObj
Sponsor wallet balance for a named feed and chain has balance issues.
createFallbackWalletAlertObj
Fallback wallet balance for a named chain has balance issues.
createWalletDataRetrievalErrorObj
Failures to retrieve wallet data for a chain. This could indicate issues with RPC providers or contract interactions.
createWalletDataProcessingErrorObj
Failures to process wallet data for A feed on any one chain. This could indicate issues with data format or unexpected values.
historyWorker
This worker is in the centurionV2-nodejs project. It finds deviation and heartbeat issues in the database meta data created by workers in the centurionV2-workers project. This worker then updates history data in the database which is used to populate the dashboard and charts in the Centurion UI.
Alerts are not raised for deviation or heartbeat feed/chain pairs on testnets.
createWorkersDownObj
When sending alerts the history worker pings the centurionV2-workers instance to verify it is alive. This prevents an alert run off because if the workers are not running there will be many failures.
This alert can be triggered when the network is down or the centurionV2-workers instance is restarted. The main concern the centurionV2-workers instance at heroku has terminated.
createHistoryAlertObj
This P1 alert is raise when a feed/chain pair is out of deviation or its heartbeat has been exceeded. The deviation is calculated between the values from AirseekerRegistry and the internal validator signed_data.
A key feature of this alert template is a link to the Airseeker Loki logs that will display the records around the last updated timestamp of the alert. The records returned are filtered and only contain those for the feed and chain in question.
- Deviation: The pair has been out of deviation for 3 consecutive run cycles of the
historyworker and the on-chain price of the pair has not changed. - Heartbeat: The pair's heartbeat has been exceeded for 3 consecutive run cycles of the
historyworker.
createDeviationOraclesAlertObj
This P5 alert is raise when a feed/chain pair is out of deviation with all external validators. The check is only made if there is more than one external validator assigned to the pair.
- Deviation: The pair has been out of deviation for 3 consecutive run cycles of the
historyworker and the on-chain price of the pair has not changed.
Global event handler
The global event handler is not a worker but does have a need to send Opsgenie alerts in the event of uncaught errors.
createGlobalErrorHandlerAlertObj
This alert is fired by any global event handler in index.js of centurionV2-workers. Sometimes third party packages such as ethers fail to trap their own errors. This can leave a worker in a stuck state as the package never returns control (via promises) to the Centurion code. Workers do have a built-in mechanism to recover from this. However, knowing it happened is important as often times a restart of Nodejs is appropriate.
module.exports = {
createGlobalErrorHandlerAlertObj,
};